CS4432: Database Systems II

CS4432: Database Systems II Query Optimizer – Cost Based Optimization

SQL query parse parse tree convert answer logical query plan execute apply laws statistics Pi “improved” l.q.p pick best estimate result sizes {(P1,C1),(P2,C2)...} l.q.p. +sizes estimate costs consider physical plans {P1,P2,…..}

sname rating > 5 bid=100 sid=sid Sailors Reserves (On-the-fly) sname (On-the-fly) rating > 5 bid=100 (Nested Loop Join) sid=sid Sailors Reserves A Query (Evaluation) Plan • An extended relational algebra tree • Annotations at each node indicate: • access methods to use for each table. • implementation methods used for each relational operator.

How to cost a physical plan? • We need estimated size of intermediate results – Chapter 16.4 • Cost of each operator/algorithm – Chapter 15 • Buffer available for the query

Result of cost-based optimization • Good physical plan • Consider different join orderings • Consider different access methods for accessing the relations

How to generate that ‘good’ Physical Plan? Many alternate search algorithms are possible: • Exhaustive listing of all possible plans • Dynamic programming • Branch and bound • Greedy bottom-up plan construction NOTE: often only left-deep trees are being considered to keep the search space small.

D D C C D B A C B A B A Why left-deep trees? • Fundamental decision in System R (IBM): • Only left-deep join treesare considered. • Left-deep trees can generate all fully pipelinedplans. • Intermediate results not written to temporary files. • Not all left-deep trees are fully pipelined (e.g., SM join).

Enumeration of Left-Deep Trees • Left-deep trees differ in : • the order of relations, • the access method for each relation, and • the join method for each join. • Number of left deep plans still exponential – n relations implies n! left-deep tree orderings

Enumeration of Left-Deep Trees • Enumerated using N passes (if N relations joined): • Pass 1: Find best 1-relation plan for each relation. • Pass 2: Find best way to join result of each 1-relation plan (as outer) to another relation. (All 2-relation plans.) • Pass N: Find best way to join result of a (N-1)-relation plan (as outer) to the N’th relation. (All N-relation plans.) • For each subset of relations, retain: • Cheapest plan overall, plus • Cheapest plan for each interesting orderof the tuples. Pass 1 A B C D Pass 2 Pass 3

Enumeration Example • Example 16.3.3, Also read Chapter 16.6.5 • If too many relations (Chapter 16.6.6): • Dynamic Programming expensive if too many relations (say more than 6 relations). • Use greedy (faster algorithm, but may yield plans not as good as Dynamic Programming)

Operator Types • Stateful versus stateless operators • Select is stateless • Join is stateful • Blocking versus non-blocking operators • Select is non-blocking • Agg functions are blocking • Pipelined versus non-pipelined operators • Select is pipelinable • What about Join ? (see next slide)

Join? • Join : Revelation is that it depends on the implementation strategy chosen for an operator • Iteration-join : pipelinable • Merge-sort join : blocking • Index join : pipelinable • Hash join : blocking

Costing of a complete plan • We went over an example query plan • Important: first we classify operators as pipelined or not-pipelined • If pipelined, then for stateless operators the IO cost is zero (for example, for Select or Project)

Costing of a Complete Query Plan • What about a Select? How is it implemented? • If in middle of plan, pipeline it (one tuple at a time iteration) • If at leaf of plan, identify any potential index to use index-lookup to implement the Select • If index available, cost of implemention the select operator is equal to cost of an index lookup

Costing of a complete plan • Main idea: • Determine # of distinct values – V(R,a) • Determine physical impl. Strategies per operator • Then, compute IO costs for each operator • Then, sum up all costs. • Done.

CS4432: Database Systems II

CS4432: Database Systems II

Presentation Transcript

DATABASE CONCEPTS

Database Operations on GPU

Compiler Concepts for Database Systems

Database Systems: Design, Implementation, and Management Ninth Edition

Mobile Database Systems

Database Security and Authorization

Object oriented Database

DB Programming

Chapter 6 DATABASES, DATA WAREHOUSES AND OLAP

Database Access using SQL

Database Management Systems

INFORMATION SYSTEMS IN BIOLOGY: FOCUS ON DATABASE

Progress Database Admin

Microsoft Access

INFO2120 – INFO2820 – COMP5138 Database Systems

Database Theory

Oracle Database Systems

Chapter 2- From Systems to Descriptive Models

Oracle Database 12c Release 1 (12.1.0.2 )

ICOM 5016 – Introduction to Database Systems