1 / 35

Chapter 7. Query Processing

Chapter 7. Query Processing. Seoul National University Department of Computer Engineering OOPSLA Lab. CONTENTS. Introduction Traversal of Nodes in a Query Graph Cost Model. Introduction. Phases of Query Processing Query Graph Similarities between OO and Relational Queries Join Methods.

reba
Download Presentation

Chapter 7. Query Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 7. Query Processing Seoul National University Department of Computer Engineering OOPSLA Lab.

  2. CONTENTS • Introduction • Traversal of Nodes in a Query Graph • Cost Model

  3. Introduction • Phases of Query Processing • Query Graph • Similarities between OO and Relational Queries • Join Methods

  4. Object Algebra Expression Standardized Expression Declarative Query Declarative Optimizer Translator into Algebraic Structure Type Check Execution of the Chosen Plans Type Consistent Expression Optimized Expression Algebraic Optimizer Generator of Execution Plans Phases of Query Processing • Evaluating relational query in two phases • first phase for possible optimization • second phase for actual execution • Only small changes for OO query processing

  5. Terminology Associated with Query • Simple predicate • <name-attribute operator value> • value can be a primitive or an object • Complex predicate • has an adjacent sequence of attributes on a branch of the aggregation hierarchy of a class • Single operand query • Multiple operand query • Target class

  6. Query Graph in OODB • A subset of the schema graph(SG) • Definition of Query Graph (QG) for a single operand query on class C • QG is a connected subgraph of SG for C • QG includes only nodes of SG used in the query • an arc from attr. to a domain in SG can be changed into an arc to a subclass of the domain in QG • the set of arcs from C to the domains of the attributes of C form a direct graph rooted in C • leaf nodes of acyclic branch have simple predicate, whereas others can have simple or complex predicates

  7. Example of Query Graph select :T from Task :T where :T man_years > 2 and :T leader specialization IN {“DB”,”IS”} and :T participant group_name LIKE ‘%CAD%’

  8. Single Operand Query • In relational data model • on a single relation • In OO data model • aggregation hierarchy rooted on the target class • implicit join • inheritance hierarchy rooted on the target class • more powerful than a query on a single relation

  9. Explicit Join • Multiple operand query • Joins between arbitrary pairs of classes • Definition of explicit join between attr. Si, Sj of class Ci, Cj • Ci, Cj are both the root of QGs that correspond to a single operand query on Ci , Cj • QGs for Ci, Cj partially overlap and Si, Sj are both the root of a hierarchy of classes • Query can have more than one target class

  10. Similarities between OO and Relational Queries • Structure of queries is essentially same except for class hierarchies • Implicit and explicit joins between the classes are equivalent to joins in relations • Relational query processing techniques can be used on OO data model • all reasonable permutations between the relations • generation of an execution plan for each permutation • evaluation of the cost of each execution plan • in distributed DB, generation of a global plan

  11. Join Methods considering Class Permutation • Forward order join where class D is the domain of // attribute A of class C • Reverse order join for c in C do begin retrieve c.A; retrieve d in D such that OID(d) = c.A if A is single-valued OID(d) c.A if A is multi-valued; evaluate the predicate on d; end. for d in D do begin retrieve u = OID(d); evaluate the predicate on d; retrieve c in C such that c.A = u if A is single-valued uc.A if A is multi-valued ; end.

  12. Traversal of Nodes in a Query Graph • Visiting Classes in Query Graph • Access Algorithms to Instances of Classes • Query Execution Strategies • Query Execution Graph • Query Optimization • Differences in Query Processing compared with the Relational Model

  13. Methods of Visiting Classes in QG(1) • Forward traversal • C1 is the root class of QG • any depth-first order in QG • ex) (Task Group Researcher) (Task Researcher Group) • Reverse traversal • C1 is a leaf class of QG • Ci can be a leaf class of QG • all the children classes must be visited previously than the parent class • ex) (Group Researcher Task) (Researcher Group Task)

  14. Methods of Visiting Classes in QG(2) • Mixed traversal • C1 is any class of QG • Ci is connected to one of C1,..., Ci-1 by a direct arc towards Ci • Ci and one of C1 ,…, Ci-1 have a common parent in QG • one or more children class of Ci must be in C1,…, Ci-1 • ex) (Group Task Researcher) (Researcher Task Group)

  15. Access Algorithms to Instances of Classes • Nested-loop access method • each instance is processed separately and completely • attributes of an instance are evaluated with concerned predicates • the instance is passed to the parent or child class • Sort-domain (or sort-merge) methods • all instances are instantiated and processed together • they are passed all together to the next proper node • storage pages of instances are accessed only once

  16. Query Execution Strategies • Combining strategies for traversing QG and accessing instances • Nested-loop forward traversal(NLFT) • Sort-domain forward traversal(SDFT) • Nested-loop reverse traversal(NLRT) • Sort-domain reverse traversal(SDRT)

  17. Basic Operations Composing Query • [P] projection • [S] selection • [] intersection • [I] index scanning • [NJ] nested-loop join • [SJ] sort-domain join • [O] arrangement

  18. An Example of Query SELECT Task WHERE man_year > 2 AND Group.group_name LIKE ‘%CAD%’ AND Researcher.specialization IN {“DB”,”IS”}

  19. Query Execution Graph by NLFT group_name LIKE ‘%CAD%’ Man_year>2 JP(1) Group Task S NJ S Specialization IN {“DB” , “IS”} JP(2) Researcher Group P NJ S {Task.OID} {Task.OID} Researcher P JP(1):Task.participant=Group.OID JP(2):Task.leader=Researcher.OID

  20. Query Execution Graph by NLRT {Group.OID} Group S P Group_name LIKE ‘%CAD%’ {Task.OID} man_year>2 NJ P S P {Task.OID} Task JP(1)  Specialization IN {“DB” , “IS”} {Researcher.OID} Researcher S P {Task.OID} man_year>2 NJ P S P {Task.OID} JP(2) Task JP(1):Task.participant=Group.OID JP(2):Task.leader=Researcher.OID

  21. Considerations on Query Optimization • Too expensive to generate all possible execution plans • To reduce the complexity of choosing the best strategy • analyze the structure of DB • collect statistics on the frequencies of access • Use of methods in queries • difficult to estimate the cost and selectivity • possible to depend the evaluation order

  22. Differences in Query Processing compared with the Relational Model • Indices • on aggregation hierarchy • on inheritance hierarchy • Multiple-valued attributes • existential or universal quantifiers • Methods on queries • Database statistics • number of instances along class hierarchy • number of pages containing such instances

  23. Cost Model • Mathematical functions to determine the cost of the various types of traversal • Definition of Path • Model Parameters • Cost Functions

  24. Definition of Path • Given an aggregation hierarchy H,path P is defined as C1.A1.A2....An(n1) where • C1is a class in H • A1 is an attribute of class C1 • Ai is an attribute of class Ci in H,such that Ci is the domain of attribute Ai-1 of class Ci-1(1<in) • length(P) : the length of the path • classes(P) : the set of classes along the path • dom(P) : the domain of attribute Anof class Cn

  25. Logical Data Parameters(1) • nci :number of classes in the inheritance having class Ci-1 • Dij : average number of distinct values for attribute Aiof class Ci,j • Di :average number of values for attribute Ai, evaluated W.R.T all the members of class Ci,1 • Ni,j :cardinality of class Ci,j • Nhi :cardinality of the set of members of class Ci,1

  26. Logical Data Parameters(2) • fani,j : average number of references to members of class Ci+1,1 contained in attribute Ai of an instance of class Ci,j • fani : average number of references to members of class Ci+1,1 contained in attribute Ai of a member of class Ci,1 • di,j : average number of instances in class Ci,j • di : average number of members of class Ci,1 • ki,j : average number of instances in class Ci,j • khi : average number of members of class Ci,1

  27. Physical Data Parameters • Pij : number of pages for the instances of class Ci,j • Phi :number of pages for the members of class Ci,1 • ri : binary variable which assumes the value • 1, if the members of Ci,1 have reverse reference to the members of Ci-1,1 • 0, if not, 2  i  n

  28. Query Parameters • NIi : number of members of class Ci,1 on which the nested predicate must be evaluated • APi : number of pages effectively accessed to find the members of class Ci,1

  29. Derived Parameters • RefBy(i, s, y, k) : average # of values contained in Ay for a set of kinstances of Ci,s • RefByh(i, s, y, k) : ... for a set of kmembers of Ci,s • : average number of instances of class Ci,j • : average number of members of class Ci,1 • Ref(i, y, s, k) : average # of instances of class Ci,s with avalue in a set of k elements, 1  i  y  n, as the value of the nested Attr. Ay • Refh(i, y, k) : average # of members ...

  30. Assumptions for Cost Model • Independent distributions of values of the various attributes • Uniform distribution of the values of the each attribute • No relation between the cardinality of the instances of a class and that of the other classes in the same inheritance hierarchy

  31. Cost Function • Nested predicate C1.A1...An op exp • op is a relation operator • exp is an expression • evaluated with a forward or reverse traversal • Types of distribution of the values of attribute Ai • disjunctive distribution • each value of is Ai used in a single class • inclusive distribution

  32. NLFT Cost Functions 1. 2. 3. Assume the total # of instances = # of pages APk = NIk 4. Total Cost

  33. SDFT Cost Functions 1. 2. 3.Total cost

  34. NLRT Cost Functions 1. 2. 2.1. 2.2. 3. 4. Total cost

  35. SDRT Cost Functions 1. 1.1 2. 2.1 3. Total cost

More Related