1 / 15

CS4432: Database Systems II

CS4432: Database Systems II. Query Optimizer – Cost Based Optimization. SQL query. parse. parse tree. convert. answer. logical query plan. execute. apply laws. statistics. Pi. “improved” l.q.p. pick best. estimate result sizes. {(P1,C1),(P2,C2)...}. l.q.p. +sizes.

neith
Download Presentation

CS4432: Database Systems II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS4432: Database Systems II Query Optimizer – Cost Based Optimization

  2. SQL query parse parse tree convert answer logical query plan execute apply laws statistics Pi “improved” l.q.p pick best estimate result sizes {(P1,C1),(P2,C2)...} l.q.p. +sizes estimate costs consider physical plans {P1,P2,…..}

  3. sname rating > 5 bid=100 sid=sid Sailors Reserves (On-the-fly) sname (On-the-fly) rating > 5 bid=100 (Nested Loop Join) sid=sid Sailors Reserves A Query (Evaluation) Plan • An extended relational algebra tree • Annotations at each node indicate: • access methods to use for each table. • implementation methods used for each relational operator.

  4. How to cost a physical plan? • We need estimated size of intermediate results – Chapter 16.4 • Cost of each operator/algorithm – Chapter 15 • Buffer available for the query

  5. Result of cost-based optimization • Good physical plan • Consider different join orderings • Consider different access methods for accessing the relations

  6. How to generate that ‘good’ Physical Plan? Many alternate search algorithms are possible: • Exhaustive listing of all possible plans • Dynamic programming • Branch and bound • Greedy bottom-up plan construction NOTE: often only left-deep trees are being considered to keep the search space small.

  7. D D C C D B A C B A B A Why left-deep trees? • Fundamental decision in System R (IBM): • Only left-deep join treesare considered. • Left-deep trees can generate all fully pipelinedplans. • Intermediate results not written to temporary files. • Not all left-deep trees are fully pipelined (e.g., SM join).

  8. Enumeration of Left-Deep Trees • Left-deep trees differ in : • the order of relations, • the access method for each relation, and • the join method for each join. • Number of left deep plans still exponential – n relations implies n! left-deep tree orderings

  9. Enumeration of Left-Deep Trees • Enumerated using N passes (if N relations joined): • Pass 1: Find best 1-relation plan for each relation. • Pass 2: Find best way to join result of each 1-relation plan (as outer) to another relation. (All 2-relation plans.) • Pass N: Find best way to join result of a (N-1)-relation plan (as outer) to the N’th relation. (All N-relation plans.) • For each subset of relations, retain: • Cheapest plan overall, plus • Cheapest plan for each interesting orderof the tuples. Pass 1 A B C D Pass 2 Pass 3

  10. Enumeration Example • Example 16.3.3, Also read Chapter 16.6.5 • If too many relations (Chapter 16.6.6): • Dynamic Programming expensive if too many relations (say more than 6 relations). • Use greedy (faster algorithm, but may yield plans not as good as Dynamic Programming)

  11. Operator Types • Stateful versus stateless operators • Select is stateless • Join is stateful • Blocking versus non-blocking operators • Select is non-blocking • Agg functions are blocking • Pipelined versus non-pipelined operators • Select is pipelinable • What about Join ? (see next slide)

  12. Join? • Join : Revelation is that it depends on the implementation strategy chosen for an operator • Iteration-join : pipelinable • Merge-sort join : blocking • Index join : pipelinable • Hash join : blocking

  13. Costing of a complete plan • We went over an example query plan • Important: first we classify operators as pipelined or not-pipelined • If pipelined, then for stateless operators the IO cost is zero (for example, for Select or Project)

  14. Costing of a Complete Query Plan • What about a Select? How is it implemented? • If in middle of plan, pipeline it (one tuple at a time iteration) • If at leaf of plan, identify any potential index to use index-lookup to implement the Select • If index available, cost of implemention the select operator is equal to cost of an index lookup

  15. Costing of a complete plan • Main idea: • Determine # of distinct values – V(R,a) • Determine physical impl. Strategies per operator • Then, compute IO costs for each operator • Then, sum up all costs. • Done.

More Related